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Abstract. This paper investigates a fully unsupervised statistical method for edge preserving 
image restoration and compression using a spatial decomposition scheme. Smoothed maximum 
likelihood is used for local estimation of edge pixels from mixture parametric models of local 
templates. For the complementary smooth part the traditional L2-variational problem is solved 
in the Fourier domain with Thin Plate Spline (TPS) regularization [48]. It is well known that 
naive Fourier compression of the whole image fails to restore a piece-wise smooth noisy image 
satisfactorily due to Gibbs phenomenon f27]. Images are interpreted as relative frequency his- 
tograms of samples from bi-variate densities where the sample sizes might be unknown. The 
set of discontinuities is assumed to be completely unsupervised Lebesgue-null, compact subset 
of the plane in the continuous formulation of the problem. Proposed spatial decomposition 
uses a widely used topological concept, partition of unity [35146] . The decision on edge pixel 
neighborhoods are made based on the multiple testing procedure of [29132] . Statistical sum- 
mary of the final output is decomposed into two layers of information extraction, one for the 
subset of edge pixels and the other for the smooth region. Robustness is also demonstrated by 
applying the technique on noisy degradation of clean images. 
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1 Introduction 

1.1 Data generation process 

A monochromatic digital image can be regarded as a surface of the image intensity function 
defined at pixels of the image, with possible edges visually demarcating outlines of embedded 
objects in an otherwise smooth background. In a variety of applications of image analysis 
it is important to recognize presence of objects of interests in images. Identification of edge 
pixels plays an important role towards this objective. Edges can be classified into different 
categories depending on the behavior of the image in its neighborhood. Because of this 
reason edges are local features of an image (see [39] ) . Edge detection and image restoration 
problems in image processing are closely related to the jump surface estimation problem in 
statistics [501131] . 



In this article we consider an image as a bi-variate histogram data with pixels as bins 
where samples are drawn from a bi-variate density / defined on D = [0,1] x [0,1]. The 
unknown density / is assumed to lack global smoothness properties and may have various 
forms of discontinuities like jumps and edges. Let s = (x,y) £ T> denote a generic point in 
T> and suppose we have a sample £1, £2, • • • > £t from the unknown density /. Without loss 
of generality consider an M x M equispaced discretization of the domain T> into D = M 2 
pixels (squares) centered at sites si, S2, . . . , sjj where Sj's are some suitable enumeration of 
the (sites) given by 



i + 1/2 j + 1/2 



: < i,j < (M — 1) 



M ' M 

We shall denote the pixels by Sj CD. The choice of boundary of pixels Si is not crucial to our 
analysis as we work with continuous probability distributions. The collection of vertical and 
horizontal lines of the basic grid form a set of measure 0. However, while implementing the 
algorithm it is convenient to work with a consistent convention. For theoretical discussions 
we shall assume that Si's are open rectangles of the form (i, x (j, and hence they are 
disjoint. The set C = T>\ (Uj g g Si), consisting of the vertical and horizontal grid boundaries, 
is a set of measure . Finally, let Y(si),y(s2), . . . ,Y(sn) denote the counts in different 
subsets Si, S2, • • • , Sd respectively. The joint distribution of the observation vector Y = 
( y(si), y(s2), . . . , Y(s£)) ) is a multinomial distribution with parameters (T, II f) where 77/ 
is a probability mass function indexed by Sj with 



n f (s i )=Pv f {^eS i }= f /(s) ds, (l.l) 



for i = 1, 2, . . . , D. Here ds indicates integration with respect to the Lebesgue measure on 
T>. Note that II f is a probability mass function over Q which is the finest parametrization of 
the unknown density /, that is, given the discretization / cannot be recovered at resolutions 
finer than the pixel level averages. 



The model given by (1.1) is slightly different from the traditional function estimation 
model with additive Gaussian white noise |ll|15|4i] . The common variance of the errors 
determines the precision of the image. The formulation considered here also adheres to the 
principles of digital image processing [39J . The total number of pixels M 2 is indicative of the 
designed precision of the imaging device. For satellite imagery in different spectral channels 
M relates to the gridding of the target area on the ground for which image is supposed to 
be sufficiently precise (such as 3 x 3m 2 or 6 x 6m 2 ). The more precise the technology the 
larger value is assumed by M (also lesser bias in the estimation of /). On the other hand the 
total number of samples T relates to other noisy disturbance present in the channel. The 
lesser the noise the larger would be the value of T. We discuss it in more detail in section 2. 
From asymptotic considerations it has been shown that both formulations lead to equivalent 
sequence of experiments in the sense of Le Cam's measure of deficiency |6|36j . There is one 
glitch though. The drift function in the additive version of the density estimation problem 
becomes Z 1 ^ 2 . 



1.2 Semi-parametric mixture model for / 

Towards this let N(s) C T> denote an open square with center s £ T>. These open squares 
form a basis for the topology of the interior (0,1) x (0,1). Because the boundaries are 



assumed to have measure zero (as we assume the existence of density under Lebesgue 
measure) it is enough to restrict to the Borel a— field of int(P) for approximating image 
intensity function /. The extension to the whole of T> is straightforward by treating T> as 
a topological subspace of R 2 [35]. Let O denote the collection of open squares of the form 
N(s). 

For any finite collection Ni, N2, ■ ■ ■ , Nf- G O and any compact set A C UjiVj, let {(pi, N±), 
(p2, N2), • • • , (pfe, -/Vfc)} be a smooth partition of unity subordinate to the collection of open 
neighborhoods Ni's (c/. Theorem 3-11, [35] for its existence and properties). If the context 
is clear a partition of unity will be denoted by (pi,p2 5 - ■ ■ , Pfc) to keep notations simple. 
Specifically, pi's are kernel- like smooth functions defined on T> with pi vanishing outside iVj. 
The following properties are very useful. 

supp pi C Ni, 

< pi(s) + p 2 (s) + . . . + p fe (s) < 1 for all s G Z>, (1.2) 
pi(s) + p 2 (s) + . . . + Pfe(s) = 1 for all s G A 

Define po(s) = 1 - Z^Pi(s), for all s G £>. Then p (s) + pi(s) + . . . + Pfc(s) = 1 for 
every s G V. Note that po assumes the value on the compact set A. For the purpose of 
implementation any function defined on a continuous domain will be approximated by its 
average value at the finest resolution, that is, at the pixels Si's. 

The idea behind partition of unity is to decompose the domain of function into regions 
based on contrasting properties like local irregularities (such as high local oscillations or 
Holder continuity of index < 1) and global smoothness (such as uniform twice differen- 
tiability). Edges tend to appear as boundaries between these contrasting regions. For this 
purpose indicator type partitions are not suitable as the artifice jeopardize the properties we 
are looking for. On the other hand smooth partitions of unity do not disturb the smoothness 
profile. However we cannot make a crisp judgement regarding edges and boundaries (even 
without noise) as piPj = is not satisfied for % ^ j. More usefulness of smooth partitions of 
unity and other issues will be discussed in later sections. 

Given any non-negative, integrable function p defined on T> let \ p denote the measure 
on (£>, B) 

\ p (A) = [ p(s)ds forAeB, (1.3) 



J A 

where B is the Borel <r-field. As has already been observed we practically work on a finite 
sub-field of B generated by arbitrary unions of Si's, mostly rectangles of sub-images. If for 
another non-negative, integrable g, \ g is absolutely continuous with respect to \ p then it will 
be abbreviated as g«p (see [1]). For any density / and a partition of unity (pi, p 2 , . . . , Pk) 
the density can be localized into k pieces given by pif, i = 1, 2, . . . , k. Notice that Pi/<Pi- 
In case we can localize / appropriately by making a right choice of the partition of unity 
functions so that the the neighborhoods dominating the pj's cover points of discontinuity 
of / or /'. The remaining component po/ will extract the regular or smooth part of / from 
the data. On the basis of this basic principle we choose the semi-parametric model for the 
unknown density / by 

{k fe > 

y^at/i(s|0i) + a g(s) : fi(-\9i) <p;,#<PQ, for some (pi),^^ = 1 > ■ (1.4) 
i=l J 



The model given by (1.4) is an extension of usual mixture model due to the non- 
parametric component g and has an independent theoretical interest on its own. In this 
article (1.4) is adapted for certain specific application on edge- preservation and image 
restoration in mind. For comprehensive discussion on mixture models in classical statis- 
tics we refer to [33] ■ Here for each i, /j(s|0j) denote family of locally defined parametrized 
densities which play a crucial role in fitting edges. In this paper, we develop a statisti- 
cal methodology for edge detection and estimation by modeling the local features using 
a Local Template Model (LTM) based on low-dimensional exponential family where the 
sufficient statistics allow different types of edges. The dimension is kept low with compu- 
tational complexities in mind. The image domain T> is scanned by fitting the LTM over 
pixel windows of fixed size (we reported the results with 11 x 11 windows) and significant 
locations were captured by testing presence of edge in multiple locations using the Holms 
procedure |29|32j . The structure of (1.4) will be elaborated further in ensuing sections. The 
non-parametric component g in (1.4) provides information about the smooth part or the 
background luminosity of the image. There are various smoothing techniques such as, kernel, 
local polynomials, splines or other variational optimization principles |10|20|21|48| available 
in the literature for estimating functions of uniform smoothness. The main challenge here 
is extraction of the component in presence of local irregularities generated by embedded 
objects. This is reflected in the complementary portion of (1.4) which is represented by 
a localized mixture model. In the present article, after eliminating local irregularities by 
maximum likelihood based optimization, we transform the problem to spectral domain (by 
virtue of bi-variate Fourier transform) and estimate the non-parametric component g. We 
minimize the squared ^-distance between the empirical and theoretical Fourier coefficients 
using Lagrangian relaxation by Thin Plate Spline (TPS). Spectral optimization turns out 
to be particularly simple and provides a closed form expression in the form of a kernel con- 
volution with nonlinear bandwidth (that is, not a simple scaling) determined by the penalty 
parameters re-transformed using the inverse Fourier transform back to the spatial domain. 
In the one-dimensional case theoretical properties such as computation of mean integrated 
squared error (MISE), minimaxity and other optimality and implementation issues such as 
plug-in, cross-validation or thresholding properties have been studied in detail. See for ex- 
ample, [20,26]. In our experiments with some of the benchmark images the LTM based edge 
preservation and TPS relaxation give fairly accurate outputs even under noisy condition 
without any Gibb's like phenomenon or increased bias. The plug-in method used here to es- 
timate undetermined Lagrangian parameters provides an alternative to Efromovich-Pinsker 
Block shrinkage method for regularizing Fourier coefficients developed in |17|I18| . 



1.3 Other methods: wavelets and Bayesian 

There are two other frequently used methodology for edge detection and edge preserving im- 
age restoration in noisy cases. They are wavelets and Bayesian image modeling. We discuss 
some salient features, commonalities and differences with the proposed method very briefly. 
Wavelet based recovery from noisy signals have been extensively studied for this problem 
with significant amount of success |l|15|31j . A comprehensive theoretical treatment of ba- 
sic variational regularization of wavelets can be found in |10| . For wavelet based methods 
optimal approximation depends critically on the space of functions being interpolated. Typ- 
ically, smooth spaces involve Sobolev spaces and Besov spaces include more badly behaved 



functions (which are of interest here). Functions in these spaces can be well approximated 
by wavelets. However, the success of wavelet scheme critically depend on the filter banks and 
thresholding to attain consistency in noisy image without losing much in the interpolation 
problem without noise. Several thresholding schemes have been developed and sharpened 
over the years. Some of the basic issues are described in |14j . We also demonstrate below 
in Figure 1, how the benchmark functions of [H] can be constructed using the main model 



given by (1.4). As far as similarity goes the wavelet functions due to their location shifts 



and scaling at different resolutions are similar to searching the image domain T> by local 
templates. Although orthogonality is a bonus in terms of computation it is not possible to 
locally deform wavelets using local parametric templates retaining their orthogonality. Sev- 
eral improvements and newer ideas have been added to achieve this flexibility in the form 
of Gabor filter banks, kernel deformation, curvelets, contourlets among others |7|13|38j . 
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Fig. 1. Densities similar to the benchmark functions of [14] obtained through the model (1.4 1 



In the Bayesian image processing literature basic attempt is to model the pixel probabil- 
ity vector 77 in ( 1.1 ) using Markov random field priors [12 23.24J. The key idea is to compute 
the posterior of the image, however a computational variant called Maximum a Posteriori 
Probability (MAP) estimate is commonly used. The method has generalized to variational 



Bayesian learning (ensemble learning) and Bayesian networks using extensive use of MCMC 
and other sampling methods [3]. The construction of the main object of interest, that is, 
the prior for 77 is achieved by defining a suitable graphical structure of neighborhoods on 
the set Q and then use the cliques of the graph to define the so called Gibbs distribution 
|12j . The prior constructed in this manner can be used as the relaxation for the maximum 
likelihood variational problem. This poses a challenging question: how to construct suitable 
Markov random field on J 7 in (1.4)? 

In recent years, numerous smoothing procedures have been suggested in the statistical 
literature for edge-preserving function estimation from noisy data. General description can 
be found in the following references |21|26l34|41|48j . Several ID-methodology, have been 
discussed in literature. See references in [30] for details. A good overview of edge detection 
techniques has been discussed in [50J. There are many edge detection algorithms available 
in literature such as ISEF, Canny, Marr-Hildreth, Sobel, Kirsch, Laplal and Lapla2. A 
detailed and comparative study of these techniques can be found in [43]. Reviews of the 
state of the art techniques for edge and line oriented approaches to contour detection can 
be seen in |37| . Other edge preserving approaches in image processing have been discussed 
in literature |ll|28j and references therin. 

The rest of the paper is organized as follows. In section 2 we provide theoretical details 
and some statistical issues related to our problem. Section 3 describes the methodology 
which we follow. Section 4 is dedicated for the application of our methodology on Lenna's 
image. In Section 5 we analyze the errors and conclude in Section 6 with some limitations, 
modifications and possible extensions. Few rigorous theoretical proofs and the detailed al- 
gorithm can be seen in the Appendix. 



2 Theory basics 

2.1 Basic distribution theory 



Consider the multinomial model (1.1) and the empirical histogram descriptor defined on 
S := ufLi Si C V. The almost everywhere definition suffices as \{V\S) = (see section hi). 



D D 



f* mp (s) = -Y J Y(s i )I( S €S i ), (2.1) 



where /(•) denotes the indicator function. The expected image density, the main parameter 
of interest under the model, is given by 



D 



Tr f (s) = Dj2n f {s i )I(seS i ). (2.2) 



i=l 

Note that nf is provides the information on average intensity of the image at the pixels and 
thus represents the true image after theoretical noise removal. Under model assumptions (be 
it parametric or non-parametric) the theoretical expectation functional E^. is assumed to 
be unknown. The central goal of any statistical procedure is to reconstruct this functional 
from pixel site data Y and structural model assumptions involving unknown parameters and 



functions (in our case it is provided by (1.4)). The empirical evidence is summarized through 



/j, mp . If we estimate the target functional simply by E^p we get an under-smoothed es- 
timate missing out relevant substructures in noisy images and becoming extremely sensitive 
to minor degradation and noise, losing robustness altogether. 



With the above scaling for any integrable function p defined on V, in view of ( 1.1 ),(2.1 ) 



and (2.2) the expectation projection functionals can be calculated as follows. 



1 D 

E f emp(p) = - P( S i) Y ( S 



T 

D i_1 (2-3) 



i=l 



where p(sj) = D J g . p(s) ds = ( f s . p(s) ds) / ( J s , ds) for i = 1, 2, . . . , D. We interpret p 

as the image projection of original p. The quantities on the right hand side of (2.3) can be 
interpreted as discretized expectation functionals at the pixel sites Q . The basic projection 
identity helps us understanding how the class of estimators defined by linear functionals of 
densities defined on (£>,£>) reduce to linear functions of sufficient statistics {Y(si)} under 



the multinomial model (1.1) 



In this section we obtain some heuristic approximations which can be made into precise 
asymptotics if so desired. We also suppress the suffix T wherever possible for notational 
simplicity (both T and D = M 2 are given quantities). However we shall keep the fact in 
mind that both D and T are required to be large for greater precision in estimation. For 
any m x m square neighborhood N define its spatial bandwidth to be h m = (m/K). For 
a rectangular m x r neighborhood its spatial bandwidth will be (h m h r ) 1 / 2 . We implicitly 
assume the following in what follows. For any rectangular m x r neighborhood 

ah m h r < IJf(N) < Ah m h r , (2.4) 

for suitable global constants a, A > 0. 

A function p defined on T> is image measurable if it assumes constant values over pixels 
Si's. For an image measurable p, it can be easily seen p = p. We list some useful properties 
of multinomial counts in the following proposition. 

Proposition 2.1 (i) Let p be any image measurable function defined on D. Then 

(a)E f E f? n P ( p ) = E 7Tf (p) ±E nf (p) 
1 1 



(b)Var f (E /r -(p)) = f Var(p\n f ) := - ^.(p 2 ) - E^p)] (2.5) 

Var(p\n f ). 



a 1 



(ii) Let Ni C V be mxm pixel neighborhoods and pi be image measurable function vanishing 
outside Ni for i=l,2. Then 

Cov f (%™KPi),%™ p (P2)) = ^Cov(p 1 ,p 2 \n f ) = ^ (E 7Tf (p 1 p 2 ) - E 7r/ (pi)E 7r/ (p 2 )) 

= -Cov(pi,p 2 \LT f ) 

= ^{n f (N 1 N 2 )E nf ( PlP2 \\I Nl = I m = 1) 

-77 / (iV 1 )i7 / (iV 2 )E 77/ (p 1 ||/ 7 v 1 = l)E nf (p 2 \\I N2 = 1)}, 

(2.6) 

where In is the indicator variable attached to the neighborhood N and E(-||-) is the 
generic notation for conditional expectation of the first argument given the second. In the 
last line the conditional expectations are calculated under the discrete probability setup 
(Q,LJf) with the random variables pi,p 2 '■ Q — > K defined by their image measurability. 



2.2 Discretization of MISE 

The mean integrated squared error (MISE) has become a benchmark risk function in density 
estimation over the decades in statistical literature [20 2 1|26|44] . The special structure of 
MISE has its natural roots for the class of kernel smoothed density estimators and clarity 
of bias-variance trade-off. The ANOVA decomposition of squared error and its intrinsic 
connection with Euclidean geometry is another attraction. Due to squaring of the densities 
it gets less influenced by improbable values in comparison with L\ or Hellinger metrics. It 
is known that the L\ error is equivalent to the total variation distance which represents the 
worst case discrepancy in predicting events. However this makes the metric more suitable for 
other applications. Although various other error assessment of non-parametric procedures 
has become more popular in recent years particularly in areas of machine learning such as, 
unsupervised learning, classification, prediction, data mining, and support vector machines 
we choose the MISE as the risk function for the purpose of the present article |25|47| . 
We proceed with a useful discretization lemma below which gives us a significant re- 



duction in the class of estimators under the multinomial model (1.1). In the language of 
statistical decision theory the lemma tells us while summarizing the data by the empirical 
histogram f^ mp , histogram type smooths form a complete class under MISE [2j. We require 
a slight variation of the usual conditional version of the Jensen's inequality for this pur- 
pose and later. In order to keep presentation neat we state it separately as a lemma. The 
Jensen's inequality in its conditional form is essentially the basis of Rao-Blackwellization in 
statistics. For a proof see [I]. 

Lemma 2.1 Suppose g : L x L — >• M is a continuous function defined on a suitable interval 
L C M, such that g(-,t) is convex for every t £ L. Consider a random variable U and a 
random object V defined on the same probability space with with U having finite second 
moment. Assume that U takes values in L. Then 

E{g(U,r(V))\\V}>g(E(U\\V),T(V)) a.e. (2.7) 



where r is a L-valued function defined on the range of V. If g(-,t) is strictly convex for 
every t £ L, equality holds if and only if U is a function of V . 



The random object part may be made technically more precise if we recognize it as a sub 
o"-field and r being measurable with respect to that sub cr-field. However, we discuss only 
the heuristics here. 

Consider an arbitrary estimator fx(s;Y) of the image density 7r/(s). Under the model 
(1.4) /t(-;Y) takes values in T. For the particular estimator the MISE risk function is 
defined by 

R(f T ,f)=E f J (/ T (s;y)-E / /^(s)) 2 ds 
= E f f (f T (s;Y)-7T f (s)Y ds 



V 



(2. 



D 



fris^-DnfisM ds. 



Next conceive of a extraneous randomization (/, X) which is realized by choosing a random 
index 1 < / < D and then a random point X from the uniform distribution on S/. The 
data vector Y and the other random object. All these random objects can be defined on a 
suitable product probability space with expectation denoted by E^-. We apply Lemma (2.7) 
with U = X, V = (Y, I) and g(x) = x 2 , to obtain the following lower bound. 

R(f T , f) = D 2 E} (/ T (X; Y) - DII f ( Sl 



>D 2 E* f (E*{f T (X;Y)\\V} - Dn f ( Sl ] 



2.9) 



The subscript / is omitted in the inner expectation as it is conditional on the sufficient 
statistic Y, hence expectation is carried out only with respect to external randomization. 
It can be easily checked that for any i, E*j{f T (X; Y)\\I = i, Y} = ^ J s _ /t(s) ds^ / (f s . ds^ 

= D f s /^(s)ds. In other words the Rao-Blackwellized estimator is constant on the ele- 
mentary pixels SVs. Therefore we conclude for the estimation of vrj(s) = Ej/^ mp (s) the 
class of image measurable histogram estimators given by 

u = ( Ms) = n( Y ) J ( s G s : ti(X) > °> for a11 hJ2 = D > ae \ ( 2 - 10 ) 



=i i=i 



forms a complete class under the MISE given by (2.8) (viz. [2j). Under the basic semi- 
parametric model (1.4) the edge-preserving reconstructions fx £ J~- Unfortunately the 
LTM based model does not contain histogram like functions. However histograms form a 
dense subset in a larger space of densities (viz. |6ll4l 36j). In order to rectify this problem 
we modify the continuous version of MISE and adapt it to the current context. Given a 
discretization (digitization) {Si,i G Q} of a continuous (analogue) problem The discretized 
mean square error (DMSE) for an estimator fx under the data generating density / is 
defined as 

RdUt, /) = ^ % E t/H s ) - /(b)] ds)' . 

i=1 1 (2.11) 

D ^ > 

= -E / ^[77 T (s 4 )-7I/(s,)] 2 , 



i 



where i7r(sj) = j s /^(s)ds. The validity of the approximation in (2.11) can be natu- 
rally justified as a Riemann sum approximation of the MISE. Also, by virtue of the Rao- 
Blackwellization step in ( 2.9 ) DMSE provides a lower bound of MISE for any given partition 
{Si, i £ Q}. This is a natural adaptation of classical MISE under discretization and is defined 
for continuous estimates and data generating densities. It may also be seen that the class of 
histogram estimators defined by (2.10) remains essentially complete or risk equivalent (viz. 
[2]) with the class of general smooth estimators on the continuous (analogue) domain. The 
optimal rate for bivariate density estimation with kernel smoothing can be found in [49] . 



However, it should be noted that the underlying assumptions are smooth density models. 
The optimum rate achievable under wavelets for vrious function classes are described by 
|14|16j . However, in present situation without assuming any further smoothness assump- 
tions on the set of edges it is virtually impossible to conclude about rate. Some theoretical 
approach towards optimal rates will be taken up elsewhere. 



2.3 Variational optimization of the semi-parametric likelihood with TPS 
Lagrangian relaxation 

After a suitable reparametrization / E T can be represented as /(s) = po(s) go(s) + 
^2i=i Pi( s ) 9i{ s \@i) with respect to the associated partition of unity (pi) where <7j(s|#j)'s 
are defined by 

/ g i ( S \e i ) Pi (s)ds = a i f fi(s\6i) ds, (2.12) 
J A J A 

for measurable sets A £ 13 and i = 0,1, ... ,k. Because we are dealing with probability 
distributions we also get 

k 

/ po(s)5o(s)ds + / pi(s) gi(s\9i) ds = 1. 
Jv l=1 Jv 



This is obtained by invoking the absolute continuity conditions in (1.4). This requires an 
application of Radon-Nikodym theorem [3]. The variational optimization problem we pro- 
pose does not directly optimize the DMSE. In stead we consider the following upper bound 
to arrive at a more intuitive and computable criterion using separation of variables. The 
main unstructured part of the procedure is determination of the partitions of unity (pi, Ni), 
for i = 1, 2,- • • , k where k is also assumed to be unknown unknown. This step is achieved 
by scanning the image with a fixed rectangular neighborhood. At each placement a deci- 
sion problem is posed whether the the local parametric template with discontinuity should 
be fitted or not. The decision making is not done independently of the placement, rather 
a multiple testing formulation has been considered to detect the significant neighborhood 
whose union covers the unknown subset of discontinuities. Once p^s are selected the re- 
maining parameter (conditioned on the choice of partitions of unity) are 9 = (0\, Oi,- ■ ■ , 9^) 
and a smooth density estimator <?o( s ) of po(s)f^ mp (s). We give an intuitve justification for 
separation of variables using MISE. Although the results are also true for DMSE we omit 



that for notational inconvenience. 



R(f T ,f)=E f [ (/ T ( S )-/(s)) 2 ds 
Jv 

^/r(s) - Yl a ifi( s \di) ~ «o/o(s)^ ds 

jf ^po(s) (/r(s) - go(B)) + ^ pi(s) (/t(s) - gMei))\ ds 
<E// po(s) (/r(s)-fld(s)) ds + ^E// p;( B ) (/ T (s) - ft(s|0i)) ds. 



E, 



E, 



(2.13) 



The last inequality in (2.13) follows from Jensen's inequality for each s. This inequality 
gives us an useful upper bound for the original MISE. The main advantage is that the original 
mixture problem is split into k local approximation problems and a global smoothness 
problem (separation of variables). It also gives us a preference over the choice of (pi) (and 
the dominating neighborhoods). Note the the upper bound would be exact if pi's satisfied 
the point-wise orthogonality condition pk pi = for k ^ I. This is not achievable due 



to smoothness of the partitions. The inequality (2.13) suggests that the covering should 
be as tight as possible to make the bound close to the exact MISE. Note that for 1 < 
i > k the variational problem is parametric hence the natural criterion is logdikelihood. 



Therefore we replace the MISE part by the likelihood criterion in (2.13) to obtain the 
following minimization problem in (6, go). 

Q(e,9o\fr P ) = -J2 f ^(s)log 5l (s|^)/^(s)ds 

i=i Jv (2.14) 

+ / (p (s)/r P (s)-5o(s)) 2 ds. 
Jv 

Note that as if expected with unconstrained variational problems, without any reg- 
ularization the problem has a trivial solution, namely, po — 1> pi — 0, for % > 1 and 
po(s) f^ mp (s) — go(s) = 0. Therefore a regularization is required. For the present approach 
we choose the TPS regularization assuming go to be twice continuously differentiable. Let 
L>2 denote the second derivative matrix (the Hessian) of a function of two variables. The 
TPS regularization penalizes for ||I?2/|| 2 = J-p \ \ ^/(s) ||^ ds, where the norm inside the 
integration denotes the sum of squares of the entries of the Hessian (squared Frobenious 
norm). After adding a suitable Lagrangian penalty parameter the final variational optimiza- 
tion problem with TPS regularization becomes 

(0,go)= argmin {Q(0, g \f* mp ) + A| |D 25o | | 2 } • (2.15) 

/G -F, g « po 



For the parametric part the optimization the discretization can be done using Theorem 2.1 
This reduces the value of the objective function. Since (— logx ) is a convex function it can 



be readily verified that for each i, 



- [ Pi ( S ) log 5i ( S |^)/^(s)ds > -1 V log^SiJIeO^Bi), 

where <?(si) = (J* g , jOj(s) ds)/(fg Pi(s) ds) whenever the denominator is positive. 

One last remark about the absolute continuity of the nonparametric component go- We 
have found that the solution in this form is more stable as the solution for the unweighted 
spline go automatically satisfies go/ Po bounded. If there is any indication of possible sin- 
gularity, a convolution of the Fourier coefficients of go with Fourier coefficients of po serves 
the purpose. However the near singularity ( to be decided some tolerance level) might in- 
dicate something more serious, namely, certain irregular local neighborhoods of the image 
not being captured due to type II error of multiple testing procedure. 



2.4 Embedding in spectral domain 

In order to describe the image embedding to the spectral domain first we describe the 
embedding of one dimensional signals defined on a compact intervals to the spectral domain 
in some detail. For signals defined on two dimensional domains we take the tensor product 
of two one dimensional spectra. In this subsection we briefly develop the notations and 
mention some of the useful properties of spectral embedding of signals which will be helpful 
in understanding the methodology better. Denote the set of complex numbers by C and 
let T C C be the unit circle on the complex plane, T = {z G C : \z\ = 1}. This is 
a compact abelian group under the complex multiplication and the complex conjugacy 
satisfying z = z -1 on T. The unit circle is the basic construct for spectral analysis of 
signals. The natural map of the unit interval [0, 1] into T is the exponential map x t— >• z = 
exp(2-7rjx) = cos(27rx) + j sin(27rx), with j 2 = —1, defined by the Euler's formula. The 
Lebesgue measure on [0, 1] and the Haar measure on T are related by 2tt dx = dz {viz. |42j ) 
and moreover, for any integrable function f : T — > C the problem of integration can be 
transformed to the real line by the following formula. 

f f(z) dz = 2vr f 1 f(e 27Tjx ) dx. 
Jt Jo 

Note that g(x) = f(e 27rjx ) extends periodic function on the real line with period 2tt. 
The continuous homomorphisms of T into itself (the characters) are given by, 

f = {(f) n (z) = z ^ z n : n G Z}, (2.16) 

where Z denotes the set of integers and z° = 1. The sequence of functions {(j) n {z)} are 
mutually orthogonal with respect to dz and {(2ir)~ 1 ^ 2 (j) n (z) : n £ Z} forms a complete 
orthonormal basis of L%(T, dz). The Fourier coefficients of a function / G L%(T, dz) are 
given by 

L=~ I z- n f(z)dz, fornGZ. (2.17) 

The fundamental isometry property of the transform yields the following Fourier inversion 
formula and preservation of inner products and distances in -^(T, dz) (Plancherel theorem, 



Perseval's identity, viz. [12] )■ For f,g 6 L,2(T,dz) 



1 

2^ 



/co= jr /„*» 

n=— oo 

1 r 00 

— / f(z)g(z)dz = fn9n (2-18) 

n=-oo 

„ oo 

/ \f(z)-g(z)\ 2 dz = V \f n -g n \ 2 



For differentiable functions / : T — > C we shall denote the derivative along the circle at a 
point 2 = zq either by f z (zo) or gj/(zo) to distinguish it from the derivative in the complex 
plane d/dz. The definition being as 

fz(zo) = -^-{z ) = 2vrj hm 



dz hrxl h — 1 

where r> indicates convergence along T. If f z E L2(T,dz) the Fourier coefficients satisfy 

fz,n = nf n , for ii£Z. (2.19) 

From previous discussions we have already observed that Fourier coefficient contains the full 
information present in the signals which can be recovered by the inverse Fourier transform 



described by (2.18). We shall state one more important property of Fourier transforms 
(which is more crucial for density estimation problems) before we finish this discussion. If 
we have non-negative signals such as histograms, f^ mp , ttj or fx discussed section 2.1 one has 
one very useful property of Fourier transforms, that is, Bochner-Herglotz theorem, which 
states that the Fourier coefficients {/„ : n G Z} is a non- negative definite sequence. Moreover 
the converse is also true if /o = 1- Therefore spectral embedding of a one dimensional 

spatial signal / € -^([O, l],dx) is given by the sequence / = : n G € izffi) where 

/ G L2(T, dz) is defined by f{z) = f(x) for z = exp(2irjx). There is some ambiguity in the 
definition at z = 1, however that is a set of measure zero and any choice can be made. 

Next we describe how to discretize of T using the orientation of the unit circle which 
can be implemented in a fairly simple manner. Suppose we want to subdivide the unit circle 
into K arcs of equal length and put a pointer to the centers of the arcs as well. For that we 
consider the set of 2-fT-th roots of unity, that is, fi = {z : z 2K = 1}. Let U2K = exp(-7rj'/i^) 
be the 2K-th root of unity which is nearest to 1 in terms of arc-length in the counter- 
clock wise direction. Then fi can be enumerated as {uj^k : ^ = 0, 1,- • • , (2K — 1)}. In this 
enumeration the subset given by Q dd = WTk • m °dd }> consisting of K elements point to 
the centers of the K arc intervals T = {(w|^, w^ +2 ) ■ m = 0,1,- ■ ■ , (K - 1)}. Note that for 
the last interval the end point is uj 2 k = 1- It is also interesting to note that the end points 
corresponding to the even powers of U2K are also K-th. roots of unity. This is an important 
observation because this property is used non-trivially in computing the recursions for the 
fast Fourier transform (FFT) algorithm for the discrete Fourier transform (DFT). 

Finally, the embedding of the spatial domain T> of the image begin with the torus, 
T 2 = T x T. The homomorphisms of T 2 are given by (z, w) i— > (z m , w n ) for (m, n) £ Z x Z. 
All the properties of Fourier transform on T carries through for the bivariate case. Following 



the convention in section [TJ a generic point on T 2 will be denoted by u = (z,w). Once the 
spatial domain is embedded to the torus the pixel centers Sj is mapped to a point on the 
torus of the form u>i = (k^M^' ^w^) ^ or some < k,l < (M — 1). The pixel rectangles 
Sj's are smoothly mapped onto T 2 -rectangles of the form fij = U x V where U, V 6 T for 
each i = 1,2,- -,D. The group structure in T 2 is specified by U\U2 = (ziZ2,wiW2) for 
u±,U2 € T 2 . The conjugacy operation is u\ = (zi,wi), and the inverse is u^ 1 = oJ\. The 
natural metric in this group is defined by 

||wiw 2 || = V\^2 ~ 1| 2 + \wiw 2 - 1| 2 . (2.20) 
Finally let open balls centred at point u with radius r be denoted by B r (u). 



3 Methodology 

In this section we describe the detailed methodology of finding the edge-preserving smooth 



density estimate from the class J- defined in (1.4). We begin by describing the Local Template 



Model (LTM), for testing the presence of an edge in the local region of interest. We shift 
the LTM over the entire image, keeping track of the p- values for the tests. Multiple testing 
is performed to determine the regions iVj having the edges at level a. After obtaining the 
iVj's, a linear programming problem is solved to obtain, the optimal partition of unity. The 
local regions are estimated through the LTM, while the density estimate for the remaining 
smooth region is then obtained through the TPS regularization using the Fourier basis. 



3.1 Local Template Model (LTM) 

Since we have explained the Fourier technique in the spectral domain in section 2.4, for sake 
of uniformity in notation, we explain this parametric model in the spectral domain as well. 
As mentioned in the previous sections, this model is used for both edge detection and as 
well as local density estimation. Let N{uq) denote the an open neighborhood of size mxm 
pixels (m = 2£ + 1) in T 2 having center at uq . N(u>q) can be thought of as the image of an 
analogous planar neighborhood A^(so) which can explicitly written as 

, r , \ fk - t io + t + 1 \ ( j - t j + 1 + 1 



M M J V M M 

where so = { ^ 0+ ^/ 2 , ^°^ 2 j • Note that we can find r\, r 2 > such that B n (uso) C iV(u>o) C 
B r2 (uo) for the continuous domain. 

Let /?(•) denote a smooth non-negative function with support(/9) C N(cjo), and E(p) > 
with respect to the Haar measure (du = dz x dw) on T 2 . Since the Lebesgue measure 



does not change by the spectral transformation, we can define a measure following (1.3) on 
(T 2 ,S T 2), (where Bf2 is the Borel a-field on T 2 ) as 



X p (B)= / p(u)du forBG6 T 2. (3.1) 
Jb 



Note that X p acts as a local smoothing measure at ujq which is absolutely continuous with 
respect to A, the Lebesgue measure on T 2 . Thus with respect to X p we can define a model 
p(us\(3, t/) on T 2 by, 

p(w|/3, r?) cx exp {0T{w) + r,\0T{u)\) (3.2) 



where T is a function denned with range (— 7r,7r), given by T(u>) = (t(z),t(w)) where 
t(z) = x — 2ttI(x > ir), if z = ex.p(2njx) for x £ [0, 1]. Note that the definition is ambiguous 
at z = —1 or w = —1, that is, ({—1} x T) U (T x { — 1}), which is a set of measure zero. 
Furthermore we have, 

/ p(u>\P, r))du) = 1 for all (3 6 R 2 , 77 G R (3.3) 
Jt 2 



We use this to construct local template supported on N(u)q). These models (3.2 ) are scalable 
and in a local neighborhood N(u)q) it can be scaled using the support function p in the 
following manner. 

p(u\(3, rj) = exp{/3'T(^ x ) + r, \0T{uu^)\ - d((3, r,)} X p(u>) (3.4) 

We call this model, the Local Template Model on N(uq). Note that we have used this 
model to test for edges, because it has the capability to capture the several different kinds 
of discontinuities. Observe that if r] = we have a continuous function, while the presence 
of r\ shows the evidence of the discontinuity. The different kinds of discontinuities captured 
by this model, is shown in Figure 2 for varying values of r/). Note that all directional 
edges can be identified through our model. It is also possible to identify other types of edges 
such as the 'Y' shaped edge by increasing the parameters in the model. 




Fig. 2. Different Kinds of Edges detected by the Local Template Model 

We first use this model to test for edges at OJn- In order to test for edges, we test 
Hq : rj = vs. Hi : 7/ ^ 0. We perform usual likelihood ratio test [32] and calculate the 
p-value. We shift the LTM over the whole image and keep track of all the p-values. Using 
these collection of p-values, we perform multiple testing at level a by using the Holms 
procedure [29] to finally obtain the neighborhoods containing the edges. Thus, we obtain 



the regions, say N%, N2, ■ ■ ■ , which contain edges at each of its center. Let us denote the 
support function in each of these neighborhoods by pi for % G {1, . . . k}. Note that each pi 
takes function value outside of Ni for i G {1, . . . , k). 

We try to fit the LTM on these neighborhoods. The LTM p acts as a discrete multinomial 
distribution on N(cjq), when restricted to the pixel sites wi,W2,...,Wd. Thus, denoting the 
particular pi by p for ease of notation, we have, 

d((3, V )=logl Yl /'Nxexp{/3'T(^ 1 ) + I] |/3'T(^ 1 )|} (3.5) 

Denoting S = Y^u-eN(u ) Y(ui)p(tOi), we have the likelihood function as 
L(J3,r,\Y)<x [J P("i\P,v) Y{ " l)p{ "* )/S 

log(L(/3, r,\Y)) = C+ (P'T^o 1 ) + 7?|/3'T(<W)I) x Y ^P^ _ d(/3j ^ 

(3.6) 



where C is a constant. Thus we maximize the last equation in ( |3.6[ ) to obtain (/3, fj). Denoting 
this as (f3i,rji) for the i-th neighborhood, our local estimate is p(-\f3i, f/i). Transforming this 
back to the spatial domain, we obtain the ML estimated fit fi(-\0i), of (|1.4[) for i = 1, . . . , k. 



3.2 Optimal Partitions of Unity 

Using the LTM as explained in the previous subsection, we obtain N\,N2, ■ ■ ■ ,Nk as the 
significant edge covering neighborhoods. The local functions are denoted by pi for i G 
{1, . . . k}. Using these local functions we create a continuous function on the whole image, 
such that it takes values on iVj for i G {1, . . . k] and outside. In order to create such a 
function we scale each pi by a constant Qj and consider their linear sum. The value of aj 
for i G {1, . . . k} is obtained by solving an optimization problem, which can be stated as 
follows. 

Find Oj for i G {1, ... k} in order to 
Maximize t 

subject to Ya=i aiPii^oj) >t Vj G 1, . . . , k 
Yn=i aiPi(u> 0j ) < 1 Vj G 1, . . . , k 
and cti > Vi G 1 , . . . , k 

where ujQj denotes the center of the Nj for j = 1, . . . , k. 

Denoting the solution to this optimization process as dj for i G {1, . . . k} we get a 
function covering the edge points as P e = Yl\=i &iPi- This is a continuous function defined 
on the whole space which covers all the detected edges and takes the maximum value of 1. 
P s = 1 — P e , correspondingly covers the smooth region of the image. Furthermore, {P e ,P s } 
forms the optimal partition of unity |35J. Thus the final smooth edge-preserving estimate 
can be written as 

f = PJ + PJ (3.7) 



where P e f covers the local features and P s f denotes the smooth estimate. 

As explained in the previous subsection, if we denote the local estimate of fpi by f% < 
as in (1.4), then P e f is given by 

k k 

Pef = ^ &ifPi = ^ Oifi{-\0i) 



(3.8) 



i=i 



which form the first part of the estimate as in (1.4). Now with this estimate in hand, we 
proceed to estimate the smooth region using the method of TPS regularization via Fourier 
basis. 



3.3 Thin Plate Spline (TPS) Regularization via Fourier basis 

We have already discussed the details of the Fourier theory as well as the embedding of the 
image in the spectral domain in section 2.4. Following the notation therein, we proceed to 
solve the problem of smooth density estimation by minimizing (2.15). 

Following the univariate definition in (2.17), we can write the Fourier coefficients Uj-i of 
a function / £ L2CT 2 , da;) as, 



Ukl 



[ z' k w' l f(u)du for k,le 

47T 2 J T 2 



(3.9) 



Now, square summability of the solution of (2.15) implies the existence of a density / by 
the Fourier Inversion Theorem. Thus, by inverse Fourier transformation and considering the 
density to be real, we can say 



00 00 



u k ,z k w l 



m = E E 

k=—oo l=—o 
oo / — 

= E ( E u k: iz k w l + u kfi z k + ^u k) iz k w l 



fc=— 00 \l=— 00 1=1 

Now using = UkJ\ U-k,l = Uk-h the above is simplified to, 



/(«) = n , + 2 x ^ I E Uk $ zk + E u °> iwl + E E u Ki zkwl + E E Uk - izkyo 1 



vfc=l 



1=1 



k=l 1=1 



k=l 1=1 



(3.10) 

where M(z) denotes the real part of the complex number z. 

Thus, the penalty term becomes, A L 2 (/| 2 + fww + ^fzw) do; and the problem of density 
estimation now becomes the minimization problem of 

^ ^ |po(w) fT mp (") - f(u)\ 2 du + \J^ {f zz + f ww + 2/i) du (3.11) 

in the class L 2 (T 2 ,du;). Here we retain the same notation of the empirical even after em- 
bedding into T 2 . Now putting u k i = x k 1 +jy k 1, the problem in the thin plate spline format 



becomes the minimization problem of 



oo oo 



2 ~ Uk ^ + X )Yj k4 ( x l,o + Vk,o) + Yj /4 ^o,i + vl,i 

c=— oo Z=— oo l.fc=l fc=l 

f oo oo 

+A i E E( fc2 + ^ 2 ) 2 x (4* + I'm + x l-i + 



(3.12) 



Thus, we get a 1-1 correspondence between the curve fitting problem via thin plate spline 
regularization and the density estimation problem using the Fourier basis. The solution to 
this minimization problem is given as a theorem below. 



Theorem 3.1. Solution to the minimization problem given by equation (3.12) above gives 
rise to a kernel like density estimate given by 



OO ~ h. OO ~ / OO OO ~ h 1 , ~ h — / 



/M = «o,o + 2H [EtTx ¥ + 22tTx F + — 



- + \k A ^1 + AZ 4 1 + A(fc 2 + Z 2 ) 2 

vfe=l 1=1 k=l 1=1 y ' 



Proof. See Appendix. 



The optimal value of A is obtained via a grid search, which minimizes, the integrated 
mean squared error. That is, we choose that value of A for which, 

/ E\f x {u)-f(cj)\ 2 du (3.13) 

is minimized. The estimate of the smooth region in D is obtained by the inverse transfor- 
mation of the density estimate of Theorem (3.1). 



4 Implementation 

In this section we apply our methodology on the image of Lenna of size 512 x 512. We have 
chosen the Local Template Model on an 11 x 11 square. We first define a function p on the 
window and transform it into our required function p on T> and subsequently on T 2 . The 
choice of the local function p on the 11 x 11 window is delicate. We try to put maximum 
weight on the center and gradually decrease it. The function p is defined as the convolution 
of two functions hi and hi each defined on a 6 x 6 window. That is 



/5(ni,n 2 )= Y X/ h 1 (ki,k 2 )h 2 {n 1 - k!,n 2 - k 2 ) 

fel=— oo k2=— oo 

for ni,n 2 G {1, . . . , 11}. The functions hi and h 2 are defined as follows. 



(4.1) 



hi(ni,n 2 ) 



ni,n 2 {2, . . . ,5} 

0.5 m = {2,5} n 2 = {2,..., 5} 

0.5 n 2 = {2,5} m = {2,..., 5} 

1 otherwise 



(4.2) 



which is a trapezoid on the 6x6 window. Denoting the center (3.5, 3.5) as (ci, c 2 ), we define, 

' exp I - tan ( ) * } x exp { - t an ( ) * 





h 2 (n 1 ,n 2 ) 



ni,n 2 G {1, ... ,6} 

otherwise 

, (4-3) 

We have chosen the value of r = 5. However, this choice can be changed and is left upto 
the user, r controls the spread of the final convoluted function p. Figure 3, shows the graph 
of p. This function is rescaled into V and subsequently to T 2 to get the local function p. 




Fig. 3. (a) The trapezoid function on square of size 6x6; (b) The function h(x, y) on a square of size 6x6; 
(c) The final convoluted function p{x,y) for the 11 x 11 window 

Using this local function p in the LTM, we execute the edge detection algorithm as 
explained in section 3.1. We start with the center at pixel position (6,6) and shift the 
template by 3 pixels for the next iteration. We keep track of all the p-values from the LRT, 
and finally obtain the edge pixels using the Holms procedure of multiple testing at level 
a = 0.01. Note that the extreme borders of the image are inherently considered as edges. 
Using the detected and inherent windows we create the partition of unity by solving the 
linear programming problem stated in section 3.2. Let us denote the function covering the 
detected edge pixels by P e . Now, instead of showing just the centre of the detected window 
as edge points, we perform a neat trick to get the edge lines. 

For each detected edge pixel uq, we mark the pixels in iV(cJo) which lie on the straight 
line, (3 (u — uo), where (3 are obtained by maximizing the likelihood in the LTM model 
restricted to N{uq). This gives the direction of the edge in N{uq). Now we put additional 
weights on these pixel positions by the function P e . It has been seen experimentally, that 
all points with function value greater than 0.8 gives the best visual representation of the 
edge locations. Figure 4 shows the original image of Lenna, followed by the detected edges 
in Figure 5. 

We fit the LTM on the detected edges, to get the edge estimates using the method 
explained in Section 3.1. The fitted edges are shown in Figure 6. The remaining image, i.e. 
P s f (shown in figure 7), is estimated using the TPS regularization via the Fourier basis. 




Fig. 4. Original Image of Lenna Fig. 5. Image after edge detection using LTM 

The black patches in Figure 7 denotes the regions of the edges which have been omitted in 
the Fourier application to prevent Gibbs phenomenon [27] from occurring. 




Fig. 6. Fitted Edges using the Local Template Model Fig. 7. Input image for the Fourier technique 

Figure 8 shows the output from the Fourier technique as explained in Section 3.3. The 
final smooth edge-preserving density estimate is shown in Figure 9. This is obtained by 
adding the two estimates, viz. P e f + P s f . 

5 Experimental Results 

5.1 Outputs from SSH Algorithm on Noisy Images 

A quick browse through the literature in this field shows that there have been several meth- 
ods which have been proposed for dealing with edge-preserving function estimation for noisy 
images. Few such methods are based on M Estimators [11], TM Smoothers [28] . 2-D dis- 
crete wavelet transforms [9], etc. The fact against using Fourier transform was that while 
it smoothens the noise, the edges are not well preserved because of the Gibbs phenomenon. 
However, in our methodology, since we separate the edges before going into the TPS reg- 
ularization based on Fourier basis, it is only logical to see how well, our method works for 
noisy images. 



Fig. 8. The estimate through Penalized bivariate Fig. 9. Final Smooth Edge-Preserving Density Esti- 
Fourier Transform mate 



Instead of adding artificial salt and pepper noise to the data, we create noisy images 
through repeated simulations by Gibbs sampling. We consider the original image to be a 
bi-variate distribution and using Gibbs sampling we draw a large enough sample of size 
n = 50 x image size, from this distribution. The frequency plot of these points, give us an 
image. We consider this as the noisy image and proceed with our methodology. The initial 
original and noisy images are shown in Figure 10. 




Fig. 10. The first row shows the original images. The second row shows the images with noise from the 
Gibbs sampling 



Using these noisy images we execute our algorithm. The final results are shown in Figures 
11,12 and 13. Note that the edge detection methodology captures most of the noise and 
removes it from the image during edge estimation. As a result when applying the Fourier 
transformations, the noisy regions are skipped. This results in a smooth and less noisy edge- 
preserving image in the final reconstruction, as it can be seen from the final estimates of 
each of the three figures. Hence, our image reconstruction methodology is somewhat robust 
to artificial noise, which is not the case for the direct use of the Fourier transformation. 




Fig. 11. (a) Edges Estimated in the Noisy Image of Lenna; (b) The Smooth Estimate through Fourier 
transform; (c) The final edge-preserving smooth density Estimate 





Fig. 12. (a) Edges Estimated in the Noisy Image of African Elephant; (b) The Smooth Estimate through 
Fourier transform; (c) The final edge-preserving smooth density Estimate 



5.2 Error Analysis 

As explained in the previous subsection, using the Gibbs Sampling technique we draw a 
sample of size T from the original bi-variate distribution. These T points give us an image. 
We repeat this procedure to obtain N = 100 such images. For each of these images we 
estimate the final edge-preserving smooth density. The choice of the sample size in our 





simulations is an important factor. Considering the image to be of size k x k we choose the 
sample sizes &sT = mxkxk where m = 10, 20, 50 and 100. 

In this subsection we calculate the discretized mean square error (DMSE), introduced in 
(2.11), of this estimate and tabulate it in Table 1. We also calculate and report the within 
sample variance of the estimate in Table 2. Furthermore, the ratio of the DMSE to the 
within sample variance gives us an idea about the coefficient of variation which we report 
in Table 3. 



Table 1. Discretized Mean Square Error (DMSE) 



Sample Size 


Lenna's 


African 


Eiffel 


m (T = mk 2 ) 


Image 


Elephant 


Tower 


10 


0.1773 


0.1643 


0.1803 


20 


0.1164 


0.1034 


0.1277 


50 


0.0834 


0.0758 


0.0878 


100 


0.0712 


0.0685 


0.0731 



Table 2. Within sample variance for the different images 



Sample Size 


Lenna's 


African 


Eiffel 


m (T = mk 2 ) 


Image 


Elephant 


Tower 


10 


0.0936 


0.0901 


0.1006 


20 


0.0452 


0.0395 


0.0531 


50 


0.0179 


0.0121 


0.0231 


100 


0.0090 


0.0089 


0.0102 



We also report the average time taken by our SSH Algorithm. All our simulations have 
been performed in MATLAB 7.10.0 on Windows platform, with a 4GB RAM machine and 
Intel Core 2 Duo 2.5 GHz Processor. The average time for detecting the edges in a 512 x 512 
image using a window of size 11 x 11 took about 45 seconds, followed by the local template 
fitting accounting for another 30 seconds. During this time, we simultaneously create the 



Table 3. Ratio of DMSE to Within Sample Variance 



Sample Size 


Lenna's 


African 


Eiffel 


m (T = mk 2 ) 


Image 


Elephant 


Tower 


10 


1.8950 


1.8235 


1.7922 


20 


2.5777 


2.6177 


2.4048 


50 


4.6667 


6.2644 


3.8008 


100 


7.9427 


7.6966 


7.1667 



optimal partition of unity. The Fourier methodology via TPS regularization took about 2 
mins to complete. Thereby we get the whole density estimate in about 3 minute and 10 
seconds on the average. 

6 Some Concluding Remarks 

In this paper, we have discussed an unsupervised novel statistical methodology for image 
restoration and compression by using local parametric mixtures and Lagrangian relaxation. 
Considering the thin plate spline regularization we showed the equivalence of the density 
estimation problem to the TPS problem. Since Fourier compression fails to restore a piece- 
wise smooth image due to Gibbs phenomenon, we adapt our methodology by using the 
topological concept of partition of unity. 

We consider the image as a histogram data from a bi-variate distribution. Edge detection 
and estimation is performed through a local parametric model on a small window, which we 
term as the Local Template Model. Holms procedure of multiple testing is used to obtain 
the regions containing the edge pixels. The optimal partition of unity is created by proper 
scaling of the local weight functions. After edge estimation through the LTM, the remaining 
smooth region is estimated by the penalized bi-variate Fourier transform. The final estimate 
is obtained by adding the local and the smooth estimates. 

We implement our algorithm on the image of Lenna, and we are able to get the output 
in two distinct channels, viz. the edge pixels and the Fourier coefficients with the thresh- 
olding sequence. We have also performed an error analysis of our methodology. The final 
reconstructed image is smooth and edge-preserving, even in the presence of 'salt and pep- 
per' noise. Using repeated simulations, we have also calculated the coefficient of variation 
for each of the three layers of the output - the edge pixels, the Fourier smoothing and the 
final restored image. 

One of the practical usefulness of Fourier transforms depend on their fast convergence to 



the limit in (2.18) which depends on the rate of decay of Fourier coefficients. As can be seen 



form (2.19), the roughness in the signal reflected by high L2 norm of the derivatives creates a 



hindrance for quick approximation. The approximations also tend to be under-smoothed in 



case a large number of terms need to be added in the right hand side of (2.18) for an accurate 
approximation for noisy signals. Fast decaying filters applied to the raw coefficients increases 
the bias. The worst case scenario occurs when the original signal is piece- wise smooth with 
jump discontinuities. In such cases the approximation suffers from what is known as Gibbs 
phenomenon. Details of Gibbs phenomenon for the standard saw-tooth signals can be found 
in [2? 30J. One of the objections raised against spectral embedding of spatial signals is that 
the decay of the coefficients slows down due to unknown location of discontinuities. One 
known discontinuity always exists at z = 1 unless the signal is periodic. This is called edge 



effect. It is not a difficult task to remove the local ripple due to the edge discontinuity. One 
of the novelty of the method presented in this article is to demonstrate that we can extract 
the smooth periodic part of the signal after eliminating unknown number of local ripples 
after searching through the significant the signal using the parametric local templates quite 
effectively. The Fourier approximation performs quite well on the extracted smooth periodic 
portion of the signal even under noisy environments. 

Although our methodology works well, it has few a limitation. The parametric model 
that we have chosen for the LTM, has the ability to capture any linear directional curved 
edges, as seen in Figure 1. However, in to capture a 'Y' shaped edge we need to add more 
parameters to the model. Furthermore, our edge estimation technique fails to capture neat 
edges in the presence of noise. But, that is not a hindrance, as the final reconstructed image 
even in the presence of noise is a smooth and edge preserving estimate, which was our initial 
goal. 

We hope this work helps future researchers in the endeavors especially in the field of 
image restoration and compression. 

Appendix 



In this appendix we provide the detailed proof of Theorem 3.1 and the complete algorithm 
for the edge-preserving smooth density estimate. 

Proof of Theorem 13.11 

Proof. We need to minimize, 
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Using u^k-i = u k.i',U-k,i = Uk-u the first term of the above expression can be simplified 
as follows 



y~] \uk,i-u k j\ 2 = ^2 ^2\uk,-l - uk,-i\ 2 + E \uk,o - u k ,o\ 2 + ^ y~] \uk,i - u k 



2 

l\ 



k,l=—oo k=— oo 1=1 k=—oo k=—oo 1=1 

oo oo oo oo oo 



\u-k-i - u^k-i? + y, \uq,-i - uo-i\ 2 + y y 

1=1 k=l 1=1 1=1 k=l 

oo oo 

+ \u-k,o - u- kfi \ 2 + |u 0) o - uo,o\ 2 + \Uk,0 ~ Uk,o\ 2 

k=l k=l 

oo oo oo oo oo 

+ E E \u-k,l - ?i-fc,H 2 + ^2 \U0,l ~ U0d\ 2 + ^2 ^2 ~~ U k,< 

1=1 k=l 1=1 1=1 k=l 



Uk-l ~ Uk,-i\ 2 



Thus we get, 
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Let u^i = cos(k9 + Z7) + jsm(/c0 + Z7) and = a;^ + jyk,l> where 0(0,7) denotes the 
mean of over both and 7. Now the problem reduces to minimizing, 
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Differentiating this with respect to Xk,i and y^/ and for each k, I, and equating to zero, we 
get, 
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Thus the final estimate using ( |3. 10 ) is 
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Hence, proved. 



Implementation Algorithm 



We now give a detailed algorithm of our implementation to obtain the edge preserving 
smooth density estimate. 

Data: The grey value image I 
sz = Size of the image matrix; 
N = 5, To create the 11x11 matrix for the LTM; 
Create the function p on the 11x11 window and store in p; 
for i from N + 1 to sz\ — N — 1 with a jump of N/ 2 do 
for j from N + 1 to sz<i — N — 1 with a jump of N/2 do 
Put center = (i,j); 



Perform likelihood ratio test for r\ = vs n ^ using the model in (3.4) and 
store result in pval(i,j); 
end 
end 

Execute the Holms procedure with these p- values and store the result in test. This 
variable test becomes a 0-1 matrix denoting all location with 1 as edges; 
Put 1 on all the extreme border pixels of the image; 
t\ = sum total of all elements of test; 

Solve the linear programming problem posed in section 3.2 by increasing t over a grid 
as : 0.005 : 1 and store the result in a which is a vector of length t±; 
Initialize g, P e as zero matrices of size sz; 
Initialize count = 1; 

for i from N + 1 to sz\ — N — 1 with a jump of N/ 2 do 
for j from N + 1 to SZ2 — N — 1 with a jump of N/2 do 
Put center = (i,j); 
if test(i,j) = 1 then 

Scount =sum of pixel values in / restricted to the 11x11 window with 
center at (i, j); 

Estimate the local density using the LTM and store it in f CO unt\ 
Increment g by a count x f count x S count ; 
Increment P e by a count x p; 
Increment count by 1; 
end 
end 
end 

Execute the TPS regularization based on Fourier basis as given in section 2 using the 
input image as P S I, where P s = 1 — P e and store the result in h\ 
The final estimate is / = g + h. 
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